Stencil Code
   HOME

TheInfoList



OR:

Iterative Stencil Loops (ISLs) are a class of numerical data processing solution Roth, Gerald et al. (1997) Proceedings of SC'97: High Performance Networking and Computing.
Compiling Stencils in High Performance Fortran.
'
which update array elements according to some fixed pattern, called a stencil. Sloot, Peter M.A. et al. (May 28, 2002)
Computational Science – ICCS 2002: International Conference, Amsterdam, The Netherlands, April 21–24, 2002. Proceedings, Part I.
' Page 843. Publisher: Springer. .
They are most commonly found in
computer simulation Computer simulation is the process of mathematical modelling, performed on a computer, which is designed to predict the behaviour of, or the outcome of, a real-world or physical system. The reliability of some mathematical models can be dete ...
s, e.g. for
computational fluid dynamics Computational fluid dynamics (CFD) is a branch of fluid mechanics that uses numerical analysis and data structures to analyze and solve problems that involve fluid flows. Computers are used to perform the calculations required to simulate th ...
in the context of scientific and engineering applications. Other notable examples include solving
partial differential equations In mathematics, a partial differential equation (PDE) is an equation which imposes relations between the various partial derivatives of a multivariable function. The function is often thought of as an "unknown" to be solved for, similarly to ...
, the
Jacobi Jacobi may refer to: * People with the surname Jacobi (surname), Jacobi Mathematics: * Jacobi sum, a type of character sum * Jacobi method, a method for determining the solutions of a diagonally dominant system of linear equations * Jacobi eigenva ...
kernel, the
Gauss–Seidel method In numerical linear algebra, the Gauss–Seidel method, also known as the Liebmann method or the method of successive displacement, is an iterative method used to solve a system of linear equations. It is named after the German mathematicians Carl ...
,
image processing An image is a visual representation of something. It can be two-dimensional, three-dimensional, or somehow otherwise feed into the visual system to convey information. An image can be an artifact, such as a photograph or other two-dimensiona ...
and
cellular automata A cellular automaton (pl. cellular automata, abbrev. CA) is a discrete model of computation studied in automata theory. Cellular automata are also called cellular spaces, tessellation automata, homogeneous structures, cellular structures, tessel ...
. Fey, Dietmar et al. (2010)
Grid-Computing: Eine Basistechnologie für Computational Science
'. Page 439. Publisher: Springer.
The regular structure of the arrays sets stencil techniques apart from other modeling methods such as the
Finite element method The finite element method (FEM) is a popular method for numerically solving differential equations arising in engineering and mathematical modeling. Typical problem areas of interest include the traditional fields of structural analysis, heat ...
. Most finite difference codes which operate on regular grids can be formulated as ISLs.


Definition

ISLs perform a sequence of sweeps (called timesteps) through a given array. Generally this is a 2- or 3-dimensional regular grid. The elements of the arrays are often referred to as cells. In each timestep, all array elements are updated. Using neighboring array elements in a fixed pattern (the stencil), each cell's new value is computed. In most cases boundary values are left unchanged, but in some cases (e.g. LBM codes) those need to be adjusted during the computation as well. Since the stencil is the same for each element, the pattern of data accesses is repeated. More formally, we may define ISLs as a 5-tuple (I, S, S_0, s, T) with the following meaning: * I = \prod_^k , \ldots, n_i/math> is the index set. It defines the topology of the array. * S is the (not necessarily finite) set of states, one of which each cell may take on on any given timestep. * S_0\colon \Z^k \to S defines the initial state of the system at time 0. * s \in \prod_^l \Z^k is the stencil itself and describes the actual shape of the neighborhood. There are l elements in the stencil. * T\colon S^l \to S is the transition function which is used to determine a cell's new state, depending on its neighbors. Since ''I'' is a ''k''-dimensional integer interval, the array will always have the topology of a finite regular grid. The array is also called simulation space and individual cells are identified by their index c \in I. The stencil is an ordered set of l relative coordinates. We can now obtain for each cell c the tuple of its neighbors indices I_c : I_c = \ \, Their states are given by mapping the tuple I_c to the corresponding tuple of states N_i(c), where N_i\colon I \to S^l is defined as follows: : N_i(c) = (s_1, \ldots, s_l) \text s_j = S_i(I_c(j)) \, This is all we need to define the system's state for the following time steps S_\colon \Z^k \to S with i \in \N: : S_(c) = \beginT(N_i(c)), & c \in I\\ S_i(c), & c \in \Z^k \setminus I \end Note that S_i is defined on \Z^k and not just on I since the boundary conditions need to be set, too. Sometimes the elements of I_c may be defined by a vector addition modulo the simulation space's dimension to realize toroidal topologies: : I_c = \ This may be useful for implementing
periodic boundary conditions Periodic boundary conditions (PBCs) are a set of boundary conditions which are often chosen for approximating a large (infinite) system by using a small part called a ''unit cell''. PBCs are often used in computer simulations and mathematical mode ...
, which simplifies certain physical models.


Example: 2D Jacobi iteration

To illustrate the formal definition, we'll have a look at how a two dimensional
Jacobi Jacobi may refer to: * People with the surname Jacobi (surname), Jacobi Mathematics: * Jacobi sum, a type of character sum * Jacobi method, a method for determining the solutions of a diagonally dominant system of linear equations * Jacobi eigenva ...
iteration can be defined. The update function computes the arithmetic mean of a cell's four neighbors. In this case we set off with an initial solution of 0. The left and right boundary are fixed at 1, while the upper and lower boundaries are set to 0. After a sufficient number of iterations, the system converges against a saddle-shape. : \begin I & = , \ldots, 992 \\ S & = \R \\ S_0 &: \Z^2 \to \R \\ S_0((x, y)) & = \begin 1, & x < 0 \\ 0, & 0 \le x < 100 \\ 1, & x \ge 100 \end\\ s & = ((0, -1), (-1, 0), (1, 0), (0, 1)) \\ T &\colon \R^4 \to \R \\ T((x_1, x_2, x_3, x_4)) & = 0.25 \cdot (x_1 + x_2 + x_3 + x_4) \end


Stencils

The shape of the neighborhood used during the updates depends on the application itself. The most common stencils are the 2D or 3D versions of the
von Neumann neighborhood In cellular automata, the von Neumann neighborhood (or 4-neighborhood) is classically defined on a two-dimensional square lattice and is composed of a central cell and its four adjacent cells. The neighborhood is named after John von Neumann, ...
and
Moore neighborhood In cellular automata, the Moore neighborhood is defined on a two-dimensional square lattice and is composed of a central cell and the eight cells that surround it. Name The neighborhood is named after Edward F. Moore, a pioneer of cellular au ...
. The example above uses a 2D von Neumann stencil while LBM codes generally use its 3D variant.
Conway's Game of Life The Game of Life, also known simply as Life, is a cellular automaton devised by the British mathematician John Horton Conway in 1970. It is a zero-player game, meaning that its evolution is determined by its initial state, requiring no further ...
uses the 2D Moore neighborhood. That said, other stencils such as a 25-point stencil for seismic wave propagation Micikevicius, Paulius et al. (2009)
3D finite difference computation on GPUs using CUDA
' Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units
can be found, too.


Implementation issues

Many simulation codes may be formulated naturally as ISLs. Since computing time and memory consumption grow linearly with the number of array elements, parallel implementations of ISLs are of paramount importance to research.Datta, Kaushik (2009)
Auto-tuning Stencil Codes for Cache-Based Multicore Platforms
'', Ph.D. Thesis
This is challenging since the computations are tightly coupled (because of the cell updates depending on neighboring cells) and most ISLs are memory bound (i.e. the ratio of memory accesses to calculations is high). Wellein, G et al. (2009)
Efficient temporal blocking for stencil computations by multicore-aware wavefront parallelization
', 33rd Annual IEEE International Computer Software and Applications Conference, COMPSAC 2009
Virtually all current parallel architectures have been explored for executing ISLs efficiently; Datta, Kaushik et al. (2008)
Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures
'' SC '08 Proceedings of the 2008 ACM/IEEE conference on Supercomputing
at the moment
GPGPU General-purpose computing on graphics processing units (GPGPU, or less often GPGP) is the use of a graphics processing unit (GPU), which typically handles computation only for computer graphics, to perform computation in applications traditiona ...
s have proven to be most efficient. Schäfer, Andreas and Fey, Dietmar (2011)
High Performance Stencil Code Algorithms for GPGPUs
', Proceedings of the International Conference on Computational Science, ICCS 2011


Libraries

Due to both the importance of ISLs to
computer simulation Computer simulation is the process of mathematical modelling, performed on a computer, which is designed to predict the behaviour of, or the outcome of, a real-world or physical system. The reliability of some mathematical models can be dete ...
s and their high computational requirements, there are a number of efforts which aim at creating reusable libraries to support scientists in performing stencil-based computations. The libraries are mostly concerned with the parallelization, but may also tackle other challenges, such as IO,
steering Steering is a system of components, linkages, and other parts that allows a driver to control the direction of the vehicle. Introduction The most conventional steering arrangement allows a driver to turn the front wheels of a vehicle using ...
and
checkpointing Checkpointing is a technique that provides fault tolerance for computing systems. It basically consists of saving a snapshot of the application's state, so that applications can restart from that point in case of failure. This is particularly imp ...
. They may be classified by their API.


Patch-based libraries

This is a traditional design. The library manages a set of ''n''-dimensional scalar arrays, which the user program may access to perform updates. The library handles the synchronization of the boundaries (dubbed ghost zone or halo). The advantage of this interface is that the user program may loop over the arrays, which makes it easy to integrate legacy code S. Donath, J. Götz, C. Feichtinger, K. Iglberger and U. Rüde (2010)
waLBerla: Optimization for Itanium-based Systems with Thousands of Processors
', High Performance Computing in Science and Engineering, Garching/Munich 2009
. The disadvantage is that the library can not handle cache blocking (as this has to be done within the loops Nguyen, Anthony et al. (2010)
3.5-D Blocking Optimization for Stencil Computations on Modern CPUs and GPUs
', SC '10 Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
) or wrapping of the API-calls for accelerators (e.g. via CUDA or OpenCL). Implementations includ
Cactus
a physics problem solving environment, an
waLBerla


Cell-based libraries

These libraries move the interface to updating single simulation cells: only the current cell and its neighbors are exposed, e.g. via getter/setter methods. The advantage of this approach is that the library can control tightly which cells are updated in which order, which is useful not only to implement cache blocking, but also to run the same code on multi-cores and GPUs. Naoya Maruyama, Tatsuo Nomura, Kento Sato, and Satoshi Matsuoka (2011) ''Physis: An Implicitly Parallel Programming Model for Stencil Computations on Large-Scale GPU-Accelerated Supercomputers'', SC '11 Proceedings of the 2011 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis This approach requires the user to recompile the source code together with the library. Otherwise a function call for every cell update would be required, which would seriously impair performance. This is only feasible with techniques such as class templates or
metaprogramming Metaprogramming is a programming technique in which computer programs have the ability to treat other programs as their data. It means that a program can be designed to read, generate, analyze or transform other programs, and even modify itself ...
, which is also the reason why this design is only found in newer libraries. Examples ar
Physis
an
LibGeoDecomp


See also

*
Advanced Simulation Library Advanced Simulation Library (ASL) is free and open-source hardware-accelerated multiphysics simulation platform. It enables users to write customized numerical solvers in C++ and deploy them on a variety of massively parallel architecture ...
*
Finite difference method In numerical analysis, finite-difference methods (FDM) are a class of numerical techniques for solving differential equations by approximating derivatives with finite differences. Both the spatial domain and time interval (if applicable) are di ...
*
Computer simulation Computer simulation is the process of mathematical modelling, performed on a computer, which is designed to predict the behaviour of, or the outcome of, a real-world or physical system. The reliability of some mathematical models can be dete ...
*
Five-point stencil In numerical analysis, given a square grid in one or two dimensions, the five-point stencil of a point in the grid is a stencil made up of the point itself together with its four "neighbors". It is used to write finite difference approximations to ...
*
Stencil jumping Stencil jumping, at times called stencil walking, is an algorithm to locate the grid element enclosing a given point for any structured mesh. In simple words, given a point and a structured mesh, this algorithm will help locate the grid element tha ...
*
Stencil (numerical analysis) In mathematics, especially the areas of numerical analysis concentrating on the numerical solution of partial differential equations, a stencil is a geometric arrangement of a nodal group that relate to the point of interest by using a numerical a ...


References

{{reflist, 33em


External links


Physis

LibGeoDecomp

waLBerla
Computational fluid dynamics Simulation software